15 research outputs found
Quantification Learning with Applications to Mortality Surveillance
\chapter*{Abstract}
This thesis is motivated by estimating the cause specific mortality fraction (CSMF) for children deaths in Mozambique. In countries where many deaths are not assigned a cause of death, CSMF estimation is often performed by performing a verbal autopsy (VA) for a large number of deaths. A cause for each VA is then assigned via one or more computer coded verbal autopsy (CCVA) algorithms, and these cause assignments are aggregated to estimate the CSMF. We show that CSMF estimation from CCVAs is poor if there is substantial misclassification due to CCVAs being informed by non-local data. We develop a parsimonious Bayesian hierarchical model that uses a small set of labeled data that includes deaths with both a VA and a gold-standard cause of death. The labeled data is used to learn the misclassification rates from one or multiple CCVAs, and in-turn these estimated rates are used to produce a calibrated CSMF estimate. A shrinkage prior ensures that the CSMF estimate from our Bayesian model coincides with that from a CCVA in the case of no labeled data. To handle probabilistic CCVA predictions and labels, we develop an estimating equations approach that uses the Kullback-Liebler loss-function for transformation-free regression with a compositional outcome and predictor. We then use Bayesian updating of this loss function, which allows for calibrated CSMF estimation from probabilistic predictions and labels. This method is not limited to CSMF estimation and can be used for general quantification learning, which is prevalence estimation for a test population using predictions from a classifier derived from training data. Finally, we obtain CSMF estimates for child deaths in Mozambique by applying all of the developed methods to VA data collected from the Countrywide Mortality Surveillance for Action (COMSA)-Mozambique and VA and gold-standard COD data collected from the Child Health and Mortality Prevention project
Quantification Learning with Applications to Mortality Surveillance
\chapter*{Abstract}
This thesis is motivated by estimating the cause specific mortality fraction (CSMF) for children deaths in Mozambique. In countries where many deaths are not assigned a cause of death, CSMF estimation is often performed by performing a verbal autopsy (VA) for a large number of deaths. A cause for each VA is then assigned via one or more computer coded verbal autopsy (CCVA) algorithms, and these cause assignments are aggregated to estimate the CSMF. We show that CSMF estimation from CCVAs is poor if there is substantial misclassification due to CCVAs being informed by non-local data. We develop a parsimonious Bayesian hierarchical model that uses a small set of labeled data that includes deaths with both a VA and a gold-standard cause of death. The labeled data is used to learn the misclassification rates from one or multiple CCVAs, and in-turn these estimated rates are used to produce a calibrated CSMF estimate. A shrinkage prior ensures that the CSMF estimate from our Bayesian model coincides with that from a CCVA in the case of no labeled data. To handle probabilistic CCVA predictions and labels, we develop an estimating equations approach that uses the Kullback-Liebler loss-function for transformation-free regression with a compositional outcome and predictor. We then use Bayesian updating of this loss function, which allows for calibrated CSMF estimation from probabilistic predictions and labels. This method is not limited to CSMF estimation and can be used for general quantification learning, which is prevalence estimation for a test population using predictions from a classifier derived from training data. Finally, we obtain CSMF estimates for child deaths in Mozambique by applying all of the developed methods to VA data collected from the Countrywide Mortality Surveillance for Action (COMSA)-Mozambique and VA and gold-standard COD data collected from the Child Health and Mortality Prevention project
Large-scale identification of undiagnosed hepatic steatosis using natural language processingResearch in context
Summary: Background: Nonalcoholic fatty liver disease (NAFLD) is a major cause of liver-related morbidity in people with and without diabetes, but it is underdiagnosed, posing challenges for research and clinical management. Here, we determine if natural language processing (NLP) of data in the electronic health record (EHR) could identify undiagnosed patients with hepatic steatosis based on pathology and radiology reports. Methods: A rule-based NLP algorithm was built using a Linguamatics literature text mining tool to search 2.15 million pathology report and 2.7 million imaging reports in the Penn Medicine EHR from November 2014, through December 2020, for evidence of hepatic steatosis. For quality control, two independent physicians manually reviewed randomly chosen biopsy and imaging reports (n = 353, PPV 99.7%). Findings: After exclusion of individuals with other causes of hepatic steatosis, 3007 patients with biopsy-proven NAFLD and 42,083 patients with imaging-proven NAFLD were identified. Interestingly, elevated ALT was not a sensitive predictor of the presence of steatosis, and only half of the biopsied patients with steatosis ever received an ICD diagnosis code for the presence of NAFLD/NASH. There was a robust association for PNPLA3 and TM6SF2 risk alleles and steatosis identified by NLP. We identified 234 disorders that were significantly over- or underrepresented in all subjects with steatosis and identified changes in serum markers (e.g., GGT) associated with presence of steatosis. Interpretation: This study demonstrates clear feasibility of NLP-based approaches to identify patients whose steatosis was indicated in imaging and pathology reports within a large healthcare system and uncovers undercoding of NAFLD in the general population. Identification of patients at risk could link them to improved care and outcomes. Funding: The study was funded by US and German funding sources that did provide financial support only and had no influence or control over the research process
Recommended from our members
Early Noninvasive Detection of Response to Targeted Therapy in Non-Small Cell Lung Cancer
With the advent of precision oncology, there is an urgent need to develop improved methods for rapidly detecting responses to targeted therapies. Here, we have developed an ultrasensitive measure of cell-free tumor load using targeted and whole-genome sequencing approaches to assess responses to tyrosine kinase inhibitors in patients with advanced lung cancer. Analyses of 28 patients treated with anti-EGFR or HER2 therapies revealed a bimodal distribution of cell-free circulating tumor DNA (ctDNA) after therapy initiation, with molecular responders having nearly complete elimination of ctDNA (>98%). Molecular nonresponders displayed limited changes in ctDNA levels posttreatment and experienced significantly shorter progression-free survival (median 1.6 vs. 13.7 months, P < 0.0001; HR = 66.6; 95% confidence interval, 13.0-341.7), which was detected on average 4 weeks earlier than CT imaging. ctDNA analyses of patients with radiographic stable or nonmeasurable disease improved prediction of clinical outcome compared with CT imaging. These analyses provide a rapid approach for evaluating therapeutic response to targeted therapies and have important implications for the management of patients with cancer and the development of new therapeutics.Significance: Cell-free tumor load provides a novel approach for evaluating longitudinal changes in ctDNA during systemic treatment with tyrosine kinase inhibitors and serves an unmet clinical need for real-time, noninvasive detection of tumor response to targeted therapies before radiographic assessment.See related commentary by Zou and Meyerson, p. 1038